Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEZ-4549: Upgrade Hadoop Version to 3.4.1. #342

Merged
merged 3 commits into from
Dec 23, 2024

Conversation

slfan1989
Copy link
Contributor

@slfan1989 slfan1989 commented Apr 3, 2024

JIRA: TEZ-4549. Upgrade Hadoop Version to 3.4.0.

hadoop has been upgraded to 3.4.0, try to upgrade hadoop to 3.4.0.

Local compilation successful.

[INFO] tez ................................................ SUCCESS [  2.206 s]
[INFO] hadoop-shim ........................................ SUCCESS [ 27.433 s]
[INFO] tez-api ............................................ SUCCESS [ 36.299 s]
[INFO] tez-build-tools .................................... SUCCESS [  0.233 s]
[INFO] tez-common ......................................... SUCCESS [  2.093 s]
[INFO] tez-runtime-internals .............................. SUCCESS [  8.123 s]
[INFO] tez-runtime-library ................................ SUCCESS [ 16.348 s]
[INFO] tez-mapreduce ...................................... SUCCESS [ 10.233 s]
[INFO] tez-examples ....................................... SUCCESS [  1.769 s]
[INFO] tez-dag ............................................ SUCCESS [ 18.068 s]
[INFO] tez-tests .......................................... SUCCESS [ 18.191 s]
[INFO] tez-ext-service-tests .............................. SUCCESS [  6.580 s]
[INFO] tez-ui ............................................. SUCCESS [ 46.579 s]
[INFO] tez-plugins ........................................ SUCCESS [  0.469 s]
[INFO] tez-protobuf-history-plugin ........................ SUCCESS [ 11.401 s]
[INFO] tez-yarn-timeline-history .......................... SUCCESS [  5.569 s]
[INFO] tez-yarn-timeline-history-with-acls ................ SUCCESS [  4.206 s]
[INFO] tez-yarn-timeline-cache-plugin ..................... SUCCESS [ 37.988 s]
[INFO] tez-yarn-timeline-history-with-fs .................. SUCCESS [  3.089 s]
[INFO] tez-history-parser ................................. SUCCESS [ 16.163 s]
[INFO] tez-aux-services ................................... SUCCESS [  8.207 s]
[INFO] tez-tools .......................................... SUCCESS [  0.103 s]
[INFO] tez-perf-analyzer .................................. SUCCESS [  0.057 s]
[INFO] tez-job-analyzer ................................... SUCCESS [  2.805 s]
[INFO] tez-javadoc-tools .................................. SUCCESS [  0.905 s]
[INFO] hadoop-shim-impls .................................. SUCCESS [  0.058 s]
[INFO] hadoop-shim-2.8 .................................... SUCCESS [  1.576 s]
[INFO] tez-dist ........................................... SUCCESS [ 30.981 s]
[INFO] Tez ................................................ SUCCESS [  0.287 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  05:18 min
[INFO] Finished at: 2024-04-03T22:58:57+08:00
[INFO] ------------------------------------------------------------------------

@tez-yetus

This comment was marked as outdated.

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually couple Hadoop upgrade on Tez & Hive together, they have a history of not working when not on same version, everytime we didn't, it creates classpath issues when running queries in hive, we had to quickly push the last Tez release to unblock hive, since hive with 3.3.6 wasn't compatible with Tez on hadoop-3.3.1, some Htrace stuff...

Hive doesn't compile as of now with Hadoop-3.4.0, so, I think we should hold this till Hive gets that sorted

@slfan1989
Copy link
Contributor Author

We usually couple Hadoop upgrade on Tez & Hive together, they have a history of not working when not on same version, everytime we didn't, it creates classpath issues when running queries in hive, we had to quickly push the last Tez release to unblock hive, since hive with 3.3.6 wasn't compatible with Tez on hadoop-3.3.1, some Htrace stuff...

Hive doesn't compile as of now with Hadoop-3.4.0, so, I think we should hold this till Hive gets that sorted

Thank you for explanation!

@Aggarwal-Raghav
Copy link
Contributor

From changes POV, Hadoop 3.4.0 has zookeeper version 3.8.3 which uses logback. We have to explicitly exclude it from hadoop dependecies, otherwise it will lead to class loader issue. For example, in hive, If logback jar gets picked up first then hive-log4j2.properties won't be honored.
Same is mentioned in description of apache/hadoop#6582

@zhangbutao
Copy link
Contributor

From changes POV, Hadoop 3.4.0 has zookeeper version 3.8.3 which uses logback. We have to explicitly exclude it from hadoop dependecies, otherwise it will lead to class loader issue. For example, in hive, If logback jar gets picked up first then hive-log4j2.properties won't be honored. Same is mentioned in description of apache/hadoop#6582

@Aggarwal-Raghav We can trying to upgrading Hadoop version to 3.4.0 in Hive apache/hive#5500.
Any thought about refine this change apache/hive#5500?
Thanks.

@Aggarwal-Raghav
Copy link
Contributor

Have added comments on hive PR apache/hive#5500

@zhangbutao
Copy link
Contributor

@slfan1989 Could you please change the Hadoop version to 3.4.1? Thx.

@ayushtkn ayushtkn changed the title TEZ-4549. Upgrade Hadoop Version to 3.4.0. TEZ-4549: Upgrade Hadoop Version to 3.4.1. Dec 18, 2024
@tez-yetus

This comment was marked as outdated.

@ayushtkn
Copy link
Member

ayushtkn commented Dec 18, 2024

I pushed a commit to this PR to bump the version to 3.4.1, if it passes we can merge this

EDIT: Guess it requires a rebase :-)

@slfan1989
Copy link
Contributor Author

@slfan1989 Could you please change the Hadoop version to 3.4.1? Thx.

@zhangbutao Thank you for your message! and sorry I missed some information.

@ayushtkn Thank you for updating this PR again!

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 21m 9s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 💚 mvninstall 20m 33s master passed
+1 💚 compile 2m 14s master passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 💚 compile 2m 3s master passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
+1 💚 javadoc 1m 44s master passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 💚 javadoc 1m 9s master passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
_ Patch Compile Tests _
+1 💚 mvninstall 4m 19s the patch passed
+1 💚 compile 2m 15s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 💚 javac 2m 15s the patch passed
+1 💚 compile 2m 2s the patch passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
+1 💚 javac 2m 2s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 javadoc 1m 9s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 💚 javadoc 1m 8s the patch passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
_ Other Tests _
-1 ❌ unit 69m 1s root in the patch failed.
+1 💚 asflicense 0m 44s The patch does not generate ASF License warnings.
130m 48s
Reason Tests
Failed junit tests tez.analyzer.TestAnalyzer
tez.tests.TestExternalTezServices
tez.tests.TestExternalTezServicesErrors
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-342/3/artifact/out/Dockerfile
GITHUB PR #342
JIRA Issue TEZ-4549
Optional Tests dupname asflicense javac javadoc unit xml compile
uname Linux 26bda8684eb3 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / ca15119
Default Java Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
unit https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-342/3/artifact/out/patch-unit-root.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-342/3/testReport/
Max. process+thread count 2101 (vs. ulimit of 5500)
modules C: . U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-342/3/console
versions git=2.34.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@abstractdog
Copy link
Contributor

abstractdog commented Dec 19, 2024

regarding latest precommit failures
TestAnalyzer is known to be flaky
TestExternalTezServices and TestExternalTezServicesErrors need an investigation

[ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 1.485 s <<< FAILURE! - in org.apache.tez.tests.TestExternalTezServices
[ERROR] org.apache.tez.tests.TestExternalTezServices  Time elapsed: 1.463 s  <<< ERROR!
java.lang.NoClassDefFoundError: com/fasterxml/jackson/core/util/JacksonFeature
	at org.apache.tez.tests.TestExternalTezServices.setup(TestExternalTezServices.java:76)
Caused by: java.lang.ClassNotFoundException: com.fasterxml.jackson.core.util.JacksonFeature
	at org.apache.tez.tests.TestExternalTezServices.setup(TestExternalTezServices.java:76)

[ERROR] org.apache.tez.tests.TestExternalTezServices  Time elapsed: 1.463 s  <<< ERROR!
java.lang.NullPointerException
	at org.apache.tez.tests.TestExternalTezServices.tearDown(TestExternalTezServices.java:111)

@ayushtkn
Copy link
Member

@pjfanning / @slfan1989 by any chance you folks remember any change around jackson in 3.4.1 as compared to 3.4.0? any dependency removal/change or so?

@pjfanning
Copy link
Contributor

@pjfanning / @slfan1989 by any chance you folks remember any change around jackson in 3.4.1 as compared to 3.4.0? any dependency removal/change or so?

I don't know of anything. That JacksonFeature class is not used in Hadoop or Tez as far as I can see. The class was introduced in Jackson 2.12 and Hadoop has a dependency on Jackson 2.12 - we're stuck on that old version.

@ayushtkn
Copy link
Member

hmm, Something from HDFS, some jackson-databind transitive dependency missing or so. The entire trace is like

java.lang.NoClassDefFoundError: com/fasterxml/jackson/core/util/JacksonFeature

	at com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:656)
	at com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:558)
	at org.apache.hadoop.hdfs.server.blockmanagement.SlowPeerTracker.<clinit>(SlowPeerTracker.java:78)
	at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.initSlowPeerTracker(DatanodeManager.java:373)
	at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.<init>(DatanodeManager.java:263)
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.<init>(BlockManager.java:502)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:926)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:851)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1396)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:495)
	at org.apache.hadoop.hdfs.DFSTestUtil.formatNameNode(DFSTestUtil.java:256)
	at org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:1158)
	at org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:1042)
	at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:974)
	at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:594)
	at org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:533)
	at org.apache.tez.tests.ExternalTezServiceTestHelper.<init>(ExternalTezServiceTestHelper.java:63)
	at org.apache.tez.tests.TestExternalTezServices.setup(TestExternalTezServices.java:76)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
	at com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38)
	at com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11)
	at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35)
	at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:232)
	at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:55)
Caused by: java.lang.ClassNotFoundException: com.fasterxml.jackson.core.util.JacksonFeature
	at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
	... 37 more

Looking at the dependency tree now, Jackson 2.10 is getting pulled in by Avro

ayushsaxena@ayushsaxena tez-ext-service-tests % mvn clean dependency:tree | grep jackson-core:jar -B4
[INFO] |  +- org.apache.commons:commons-configuration2:jar:2.10.1:compile
[INFO] |  +- org.apache.commons:commons-lang3:jar:3.12.0:compile
[INFO] |  +- org.apache.commons:commons-text:jar:1.10.0:compile
[INFO] |  +- org.apache.avro:avro:jar:1.9.2:compile
[INFO] |  |  \- com.fasterxml.jackson.core:jackson-core:jar:2.10.2:compile

which earlier was coming from Jackson-databind defined in hadoop-common

ayushsaxena@ayushsaxena tez-ext-service-tests % mvn clean dependency:tree | grep jackson-core:jar -B4
[INFO] |  |     +- org.apache.kerby:kerby-asn1:jar:1.0.1:compile
[INFO] |  |     \- org.apache.kerby:kerby-util:jar:1.0.1:compile
[INFO] |  +- com.fasterxml.jackson.core:jackson-databind:jar:2.12.7.1:compile
[INFO] |  |  +- com.fasterxml.jackson.core:jackson-annotations:jar:2.12.7:compile
[INFO] |  |  \- com.fasterxml.jackson.core:jackson-core:jar:2.12.7:compile
ayushsaxena@ayushsaxena tez-ext-service-tests % 

Why it started happening, I am not very sure but there was some Avro related change in Hadoop
https://github.com/apache/hadoop/pull/7007/files#diff-a74b4a65ab1f4aed61344787bb9654ba7e835154730653848c88ca11f9965dc0R301

Ideally I feel we should exclude Jackson coming from Avro in Hadoop itself

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 14s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+0 🆗 mvndep 3m 58s Maven dependency ordering for branch
+1 💚 mvninstall 12m 18s master passed
+1 💚 compile 2m 49s master passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 💚 compile 2m 32s master passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
+1 💚 javadoc 1m 48s master passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 💚 javadoc 1m 33s master passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
_ Patch Compile Tests _
+0 🆗 mvndep 0m 32s Maven dependency ordering for patch
+1 💚 mvninstall 4m 26s the patch passed
+1 💚 compile 2m 49s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 💚 javac 2m 49s the patch passed
+1 💚 compile 2m 34s the patch passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
+1 💚 javac 2m 34s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 3s The patch has no ill-formed XML file.
+1 💚 javadoc 1m 33s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 💚 javadoc 1m 31s the patch passed with JDK Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
_ Other Tests _
+1 💚 unit 4m 53s tez-ext-service-tests in the patch passed.
-1 ❌ unit 74m 13s root in the patch failed.
+1 💚 asflicense 1m 8s The patch does not generate ASF License warnings.
120m 15s
Reason Tests
Failed junit tests tez.analyzer.TestAnalyzer
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-342/4/artifact/out/Dockerfile
GITHUB PR #342
JIRA Issue TEZ-4549
Optional Tests dupname asflicense javac javadoc unit xml compile
uname Linux 7a118d32a48a 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / ca15119
Default Java Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-gaus1-0ubuntu222.04-ga
unit https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-342/4/artifact/out/patch-unit-root.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-342/4/testReport/
Max. process+thread count 2100 (vs. ulimit of 5500)
modules C: tez-ext-service-tests . U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-342/4/console
versions git=2.34.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@abstractdog
Copy link
Contributor

abstractdog commented Dec 22, 2024

seems like excluding avro did the trick
TestAnalyzer is a known flakyness

java.lang.AssertionError: v2 : 00000[01]_1

so this LGTM

@abstractdog abstractdog self-requested a review December 22, 2024 09:07
Copy link
Contributor

@abstractdog abstractdog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@ayushtkn ayushtkn merged commit 198655f into apache:master Dec 23, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants